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Abstract 

Two-sided statistical tests and p-values are well defined only when 
the test statistic in question has a symmetric distribution. A new 
two-sided p-value called conditional p-value Pc is introduced here. It 
is closely related to the doubled p-value and has an intuitive appeal. 
Its use is advocated for both continuous and discrete distributions. An 
important advantage of this p-value is that equivalent 1-sided tests are 
transformed into -Pc-equivalent 2-sided tests. It is compared to the 
widely used doubled and minimum likelihood p-values. Examples in- 
clude the variance test, the binomial and the Fisher's exact test, 
keywords: two-sided tests, Fisher's exact test, variance test, bino- 
mial test, F test, minimum likelihood 
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1 Introduction 



Two-sided statistical tests are widely used and misused in numerous applica- 
tions of statistics. In fact, some applied journals do not accept papers quoting 
1-sided p-values anymore. Examples include The New England Journal of 
Medicine, Journal of the National Cancer Institute and Journal of Clinical 
Oncology among others. 

Unfortunately, two-sided statistical tests and p-values are well defined 
only when the test statistic in question has a symmetric distribution. The 
difficulties with two-sided p-values arise in a general case of a non-symmetric 
distribution, though they are more often commented on for discrete distri- 
butions. 

The most famous example is an ongoing discussion about how 2-sided p- 
values should be c onstructed for the Fisher' s exact test . This discussion was 
started in 1935 by iFishen (119351 ) and llrwini (119351). Numerous developments 
of the next 50 years are summarised in lYated (119841 ) and discussion thereof. 
The more recen t contr i butio n s include seve r al proposals based on a n a modi- 
fied UMPU testlLlovdl (Il988h . lpunne et all (ll996h.lMeuleDasl (Il998h. See also 
Agresti and Wackerlvl (Il977h . lOupontl (Il986f ). lOavisI (Il986h . lAgrestil (Il992h. 



A (far from exhausting) list of 9 different proposals is given in iMeulepas 
( 11998! ) ■ The problem is still not resolved. 



Fi sher advocat ed doubling the 1-sided p-value in his letter to Finney in 
1946 (lYatesl . 1 19841 . p. 444). This doubled p-value is denoted by Pp. Fisher's 
motivation was an equal prior weight of departure in either direction. Other 
arguments for doubling include invariance under transformation of the dis- 
tribution to a normal sca le, and ease of approximation by the chi-square 
distribution (lYated . Il984l ). One of the evident drawbacks of the doubling 
rule is that it may result in a p-value greater than 1. The doubled p-value 
is used in the majority of statistical software in the case of continuously dis- 
tributed statistics and often in the discrete case. 



The primary contribution of this article is the introduction of a new 
method of defining two-sided p-values to be called 'conditional two-sided 
p-values^ denoted by Pc- The conditional p-value is closely related to the 
doubled p-value and has an intuitive appeal. It is demonstrated that this 
new two-sided p-value has properties which make it a definite improvement 
on currently used two-sided p-values for both discrete and continuous non- 
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symmetric distributions. 



Another popular two-sided p-value for non- symmetric discrete distribu- 
tions implemented in computer packages, R (IR Development Core Team 



2004J ) in particular, is adding the probabilities of the points less proba- 
ble than the observed (at bot h tails). This p-value is d enoted by Pprob- 
This method was intr o duced in lFreeman and HaltonI (jl95ll ). and is based on 



Neyman and Pearson! (jl93ll ) idea of ordering mu ltinomial probabilities; thi s 



is called 'the principle of minimum like li hood' by [G ibbons a nd PrattI (119751 ). 
see also I George and Mudholkaii (Il990l ). iHill and P ike (1965,) were the first 
to use this p-v alue for Fisher's exact tes t. Many statisticians objected to 



this principle. iGibbons and PrattI (119751 ) commented that 'The minimum 
likelihood method can also lead to absurdities, espec ially when the di s tribu- 
tion is U-shaped, J-shaped, or simply not unimodal.' iRadlow and Alj (119751 ) 
pointed out 'This procedure is justified only if events of lower probabihty are 
necessarily more discrepant from the null hypothesis. Unfortunately, this is 
frequently not true.' 



The following example clearly demonstrates another unfortunate feature 
of this p-value. When a value of density is associated with a high 1-sided 
p-value at one tail, the value at the opposite tail cannot be rejected even 
though it may have a very low 1-sided p-value. 

Example: Two-sided variance test based on the Chi-square dis- 
tribution Suppose we have 6 observations from a perfectly normal popula- 
tion and wish to test the null hypothesis that the variance = ctq against a 
two-sided alternative. The test statistic X = {n — 1)S'^/(7q ~ (cr^/crQ)x^(5), 
where 5^ is the sample variance. For X = 1 (or 5^ = 0.2) the 1-sided p- 
value on the left tail is 0.0374, the density is 0.0807, the symmetric value 
on the right tail is x' = 6.711, the 1-sided p-value is 0.2431, see dotted lines 
on the left plot of Figured! similarly for X = 0.5 (S*^ = 0.1) the density is 
0.0366, the p-value on the left tail is 0.0079, the symmetric value is 9.255, 
p-value=0.0993 (dashed lines on the same plot). It is very difficult to reject 
the null for small observed values. 

Given critical values on the left and right tail CL,a and CR^a, such that 
xticL^a) + 1 — xticR^a) = the powcr of a two-sided variance test at level 
a is calculated as X5{pcL,a) + 1 — xl{pcR,a), where p = ctq/o"^. The power of 
four 0.05-level tests is plotted at the right plot of Figure [TJ The test based 
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Figure 1: Two-sided variance test with the statistic X ~ • On the left 

plot, the density of distribution, with dotted/dashed lines illustrating 

the calculation of the Pprob for X = 1 and X = 0.5. On the right plot, the 
power of the 5%-level variance tests based on the p- values Pprobix) (solid line), 
Pf{x) (dashed line), Pc{x) (dotted line), and the UMPU test (long-dashed 
line). The horizontal line at 0.05 corresponds to the significance level. 
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on the Pprob is evidently biased, with very low power for p < 1, i.e. when 
a < (Tq. The minimum value of the power is 0.01. 



The uniformly most powerfuU unbiased (UMPU) test for this example has 
the critical region defined by Ci,.o5 = 0.989 and c/j .05 = 14.37 corresponding 
to critical levels = .037 and an = .013 at the left and right tail, respec- 
tively. Finally, the generalized likelihood ratio (GLR) test is based on the 
statistic A = [{X/n)exp{l — (X/ra))]"/^, and it is bi ased (IStuart and Ord . 
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Example 23.5, p. 882). This is not exceptional; iBar-Lev et al.l (120021 ) 
showed that for a continuous exponential family F on the real line, the GLR 
and UMPU tests coincide if and only if, up to an affine transformation, F is 
either a normal, inverse Gaussian or gamma family. 

The new conditional 2-sided p-value Pc is formally defined in the next 
section. The power of the tests based on the doubled and conditional p-value 
for the chi-square example is also plotted in Figure [TJ They are much less 
biased, with minimum power of 0.045 and 0.048 respectively. 
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The formal definition of the conditional 2-sided p-value Pq and the com- 
parison of its properties to those of the doubled p-value and the Pprob for a 
case of continuous distributions is given in Section 2, and for discrete distri- 
butions (binomial and hypergeometric) in Section 3. Discussion is in Section 
4. The use of the conditional 2-sided p-value Pc is advocated for both con- 
tinuous and discrete distributions. An important advantage of this p-value 
is that equivalent 1-sided tests are transformed into P^-equivalent 2-sided 
tests. 



2 Two-sided p-values for continuous asym- 
metric distributions 

Consider a general case of a statistic X with a strictly increasing continuous 
null distribution F{x) with continuous density f{x). For an observed value x 
of X, one-sided p-value on the left tail is defined as P{X' < x\X — x) — F{x), 
where X' ~ F[x) independent from X. Similarly, on the right tail the p-value 
is P{X' > x\X = x) = 1 — F{x). Denote by A a generic location parameter 
chosen to separate the two tails of the distribution F. Particular examples 
include the mean E — E(X), the mode M — arg sup^f{x), or the median 
m — F~^{l/2). Which parameter should be used to separate the two tails 
depends on the context; the mean seems to be the most appropriate when a 
test statistic is based on an estimate of a natural parameter in an exponential 
family, as is the case with binomial or Fisher's exact test. General theory 
below is applicable regardless of the parameter chosen, though the details 
of examples may differ. Interestingly, it does not matter much for the most 
important non-symmetric discrete distributions: the mean when attainable 
coincides with the mode (or one of the two modes) for Poisson, binomial and 
hypergeometric distributions. The latter two distributions are discussed in 
Section 3. 



Definition 1 Weighted two-tailed p-value centered at A with weights w — 
{wl, Wr) satisfying wi, -\- wr = 1 is defined as 



^ ^F(x), l-F(x)), 

Wl Wr 
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Doubled p- value denoted by Pp{x) has weights 1/2. Without loss of gen- 
erality assume that A > m. Then the doubled p- value Pp{x) is equal to 
2F{x) for X < m, 1 for m < x < A, and 2(1 — Fix)) for x > A. Thus the 
doubled p- value is not continuous at A unless m = A, its derivative is also 
discontinuous at m. 

Similarly, a weighted p- value P^{x) is continuous at A iff Wp/wji = 
F{A)/(1 — F(A)) and an additional requirement of P^(A) = 1 results in 
wl = F{A) and wr = (1 — F{A)) arriving at the next definition. 

Definition 2 Conditional 2-sided p-value centered at A is defined as 

= P{X' <x\X ^x<A) + P{X' >x\X^x>A). ^ ' 

This is a smooth function of x (but at A), with a maximum of 1 at A. It 
strictly increases for a; < A and decreases for x > A. The conditional p- 
value is conceptually close to the doubled p-value, the only difference being 
that the two tails are weighted inversely proportionate to their probabilities. 
This results in inflated p-values on the thin tail, and deflated p- values on the 
thick tail when compared to the doubled p-value. When the tails are defined 
in respect to the median, the two p-values coincide: Pp'ix) — Pq{x). Thus 
conditional p-value is equal to the usual doubled p-value for a symmetric dis- 
tribution. It is easy to see that under the null hypothesis Pq^x) is uniformly 
distributed on [0, 1] given a particular tail, i.e. Pq{Pq{X) < p\X < A) = 
similar to a 1-sided p-value. 

There is an evident connection between a choice of a two-sided p-value and 
a critical region (CR) for a two-sided test at level a. A CR is defined through 
critical values corresponding to probabilities ai = wlol and a2 = WRa, with 
the weights of the two tails wl + wr — 1. It can equivalently be defined 
through a weighted p-value as {x : P^{x) < a}. The doubled p-value cor- 
responds to Wl = wr = 1/2. The conditional p-value is equivalent to the 
choice wl^ F{A), wr=1- F{A). 

For a two-sided test, critical values CL,a and CR^a satisfy F{cL,a) = 
and l — F{cR^a) — WRa. Thus wl — wtioc) — F{cL,a)/oi. Define A — A{a) — 
F~^[F[cLa) / a). Then the CR is {x : Pci.^) < a}. Therefore any 2-sided 
test, a UMPU test inclusive, is a test based on conditional p-value centered 
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at some A = A{a). Conversely, if the A value is chosen to be independent of 
a, the resulting test is, in general, biased. Since an independence from level 
a is a natural requirement for a p-value, some bias cannot be escaped. 



Lemma 1 For a one-parameter exponential family F{x, 9), a two-sided level- 
a test based on the conditional p-value Pc{A) is less biased in the neighbor- 
hood of the null value 9q than the standard equal tails test based on the doubled 
p-value whenever F {A) G (1/2,^2 a], where w*^^ is the weight at the left tail 
of the UMPU test. 

Proof Denote test critical function by (f){x). This is an indicator function 
of the CR, so Eo[0(X)] = a, and the power is /3{e) = Ee[0(X)]. Without 
l oss of geiierality X is the sufficient statistic. The derivative of the power is 



( ILehmannl . Il959l . p. 127) 

f3'{e) = Ee[X<p{X)] - Ee{X)Ee[<j){X)] (3) 
For an UMPU test (3'{0q) = 0. For a test with weight at the left tail, 

xdF+ / xdF-aE. 

-oo JF-l(l-a(l-«)i)) 

For a < 1, this is strictly decreasing function of wl equal zero at w1^. When 
1/2 < w^.Q, any wl G (1/2, w1 J provides positive values of P'{9o), and when 
1/2 > the values of [3'{6q) are negative; in any case the gradient is the 
steepest and the bias is the largest at 1/2, as required. 

Lemma 1 provides a sufficient condition for the P^-based test to be less 
biased than the equal tails test, but this condition is not necessary. This 
condition holds for the distribution, and the variance test based on Pci^) 
is uniformly (in n) less biased then the test based on the doubled p-value, left 
plot of Figu re [2l The doubled p-value based test is asymptotically UMPU, 
Shao fll999h . and so is the P^-based test. In the two-sample case, the equal- 



tails P-test of the equality of variances is UMPU for equal sample sizes, and 
the Pc-based test is less biased when the ratio of sample sizes is larger than 
1.7 (starting from n = Q), whereas lemma 1 holds for even more unbalanced 
sample sizes with the ratio of 2.5 or above, right plot of Figure [2j 

Finally, consider the minimum likelihood p-value. 
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Figure 2: Bias of the Pp(x)-based variance test at 5% level (dashed line), 
and P(f (a;)-based test (dotted line) in the 1-sample case (x^-test, left plot) 
and in the 2-sample case with ni = 6 (F-test, right plot). 
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Definition 3 Minimum likelihood p-value is Pprob{x) = P{f{X) < f{x)) 



Pprob{x) reaches 1 at the mode, and Pprobi^) < 1 whenever A ^ M. It is 
not a unimodal function of x when the density f{x) is not unimodal. In a 
case of a unimodal distribution, for a pair of conjugate points {x,x') : x < 
M < x', f{x') = f{x), it is calculated as Pprob{x) = F{x) + 1 — F{x'). It has 
a Unif{0, 1) distribution under the null. 

When used to define a test, the acceptance region defined as {x : Pprob (x) > 
contains the points with the highest density, and is therefore of minimum 
length. Inve rting this test result s in the shortest confiden ce intervals, see 
Sternd (119541 ) for the binomial and iBaptista and Pikd (119771 ) for the hyperge- 
ometric distri bution. It is also related to Bayes shortest posterior confidence 



intervals, see IWilson and Tonascial (119711 ) for the intervals for the standard 



deviation a and the ratio of variances in normal populations, based on in- 
verse chi and F distribution, respectively. 



a 



The next three examples clarify the properties of the conditional p-value 
Pc in comparison to Pprob- 



Example: Triangular distribution 
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Suppose that the null density is given by /(x) = 2(x + a)/[a(a + b)] for 
-a < a; < and f{x) = 2(6 — x)/[b{a + b)] for < x < 6. The mode 
M = 0, and F(0) = a/{a + h). Then for x < 0, Pp^n^ = F(x)lF ( ^) an d 



for a; > 0, Pp^^ = (1 - F{x))/{1 - P(0)), ICeorge and Mudholkai] fll990h 
Thus, Pc^(a;) = Pprob{x). This is the only unimodal distribution for which 
this equality holds as it requires the linearity of the density f{x). 
Example: Uniform distribution 

Consider a Unif{0, 1) distribution. This is a symmetric distribution with 
E = m = 1/2, and P^ = Pp = 2x for x < 1/2, and Pc = Pp = 2(1 - x) for 
X > 1/2, whereas Pprob = 1 for all values of x G [0, 1]. 

This example shows the cardinal difference between the two p-values. Pc ac- 
knowledges unusual values of x at the ends of the interval, and the Pprob does 
not. This is a somewhat extreme example, because the uniform distribu- 
tion has a whole interval of modes. The next example deals with a unimodal 
distribution but shows exactly the same properties of the respective p-values. 

Example: Left-truncated normal distribution. 

Denote the standard normal distribution function and density by $(x) 
and 0(x), respectively. Consider a left-truncated at —L < normal distri- 
bution G'l(x) = ($(x) - <I>(-L))/(1 - $(-L)) defined for x > -L. The 
mode is at zero. Then Pprob{x) = 2Gl{—\x\) + 1 — Gl{L) for —L < x < L, 
and Pprob{x) = 1 — G'l(x) for x > L. Pprob reaches 1 at 0, and Pprobi^L) = 
1 — Gl{L) is continuous at L, but its derivative is not continuous at L. 
The mean is E = E{L) = (f){—L)/{l — $(— L)), and the conditional p-value 
Pc(x) reaches 1 at E{L). An example for L = 0.5 is plotted in the right 
plot in Figure [3l For this example E{L) = 0.509 and the weight of the left 
tail is wl = 0.558. The main difference between the two p-values is that 
Pprob > 1 — the left tail, so even the low values of x in the vicin- 

ity of —L have rather high p-values. On the other hand, Pc is very close 
to zero for these values, recognizing that it is rather unusual to get close to 
—L. It seems that a small two-sided p-value at the left tail makes more sense. 

The above two examples show the properties of the Pc which are perhaps 
clear from its definition: it compares a value x to other values at the same 
tail. On the other hand, Pprob depends on the values at both tails. The same 
circumstances arise in the variance test example which was introduced in the 
Introduction. 
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Figure 3: Plot of Pprob{x) (solid line), Pp{x) (dashed line), and Pc{x) (dotted 
line) for the x^(5) distribution (left plot) and for a standard normal distri- 
bution truncated at —0.5 (right plot). The plotted doubled p-value Pp{x) is 
not truncated at 1. 
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Example: variance test based on the Chi-square distribution 
(continued) Recall, that for X = 0.5 (s^ = 0.1) the 1-sided p-value is 
.0079, and the value of X with equal density is X'^ = 9.256 with the 1- 
sided p-value of .0993. The mean E = 5, and the conditional p- values are 
P^{0.5) = 0.0135, and Pj^(9.256) = 0.239, the weight of the left tail is 
wl = F{E) = 0.584. The value with the same conditional p-value on the 
opposite tail is X'^ = F-^{1 - (1 - wl)Pc(0.5)) = 16.48 with the 1-sided 
p-value of 0.0056. Clearly, X'^j is more comparable to X than the value X^. 
The three p- values are plotted at the left plot in Figure [31 

The power of the three tests and of the UMPU test (all at 5% level) is 
shown in the right plot in Figure [H The UMPU test is the conditional test 
with A = 6.403, corresponding to the weight = 0.731. All three tests 
are biased, with the bias B defined as the minimum difference between the 
power and level being Bp = —0.0046 for the doubled and Be = —0.0020 for 
the conditional test. This agrees with Lemma 1. The doubled test is slightly 
less powerful on the right, and slightly more on the left. The test based on 
Pprob has very large bias and such low power for the alternatives a < ctq, that 
it does not deserve to be called a two-sided test. 
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The main difficulty associated with the two-sided tests is that two equiv- 
alent 1-sided tests may result in distinct 2-sided tests. For the variance test 
example the tests based on — (Tq| and on | log(s.^/o"o)| are not equivalent. 
Let D{x,A) be a measure of distance from A. It imposes an equivalence 
of points at two sides of A: each value x < A has an equidistant value 
x'j^ : D{x, A) = D{x', A). Two-sided tests based on \X — A\ and D{X, A) are 
not equivalent, generally speaking, because for x < A, the equidistant value 
x'jj 7^ 2A — X. This results in different rejection regions for the two tests. 
The main advantages of the conditional p- value Pci^) are given in the next 
Lemma. 

Lemma 2 (i) For a strictly increasing function T{x), the conditional p-value 
Pc{T{x)\T{A)) = Pi{x). 

(a) Suppose D{x,A) strictly decreases for x < A and strictly increases for 
X > A, and D{A,A) = 0. Define the conditional p-value for the dis- 
tance D{x,A) as PciD{x,A)) = P{D{x',A) > D{x,A)\X = x < A) + 
P{D{x\A) > D{x,A)\X = x>A). Then PciD{x,A)) = Pci\x-A\). 

The ffist statement of the lemma easily follows from the definition of Pq{x), 
and for the second statement take T{x) = D{x, A)sign{x — A). This is a 
strictly increasing function of x, and the proof follows from part (i). 

The ffist part of the lemma ensures that equivalent 1-sided tests are trans- 
formed into Pc (a;)-equivalent 2-sided tests. The second part states that the 
2-sided tests based on any measure of distance from A are Pc (a;)-equivalent. 
This is true because the conditional p-value ignores any equivalence between 
the points at different tails. 



3 Discrete distributions 

In this section the 2-sided conditional p-value Pc is defined for a discrete 
distribution. It is also compared to Pp and Pprob for two important cases: 
binomial and hypergeometric distributions. 

The definition of the conditional p-value Pc ([2]) is also applicable in a 
discrete case, but it may require a modification when the value A is attain- 
able. Since the value A belongs to both tails, the previously defined weights 
of the tails wl = P{x < A) and wr = P{x > A) add up to 1 P{A) > 1. 
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The modified weights of the tails are w^^™''' = P{x < A)/{1 + P{A)) and 
w^*'™'^ = P{x > A)/{1 + P{A)). This modification is akin to continuity 
correction. The formal definition of Pc{x) is 

Definition 4 Conditional two-sided p-value for a discrete distribution is 
. P{X<x). . P{X>x). 

= — I ix<A) + 1 1 ix=A) + — I {x>A) , (4) 

where the weights are wl = P{x < A) and wr = P{x > A). Modified 
conditional p-value P^^™'\x) is defined with weights = P{x < A)/{1 -\- 
P{A)) and = P{x > A) /{I + P{A)) m equation gjj. 

Two definitions coincide when the value A is not attainable. In a discrete 
symmetric case when A = E = m is a,Ti attainable value the values of 
P^(x) = Pf{x) are doubled 1-sided values, and the values of Pc{x) are 
(1 + P{A)) times smaller, and the Pc (x)-based test is therefore more liberal. 
The conditional p-value has a mode of 1 at A when this value is attainable, 
and two modes of 1 at the attainable values above and below A when A is 
not an attainable value. It has discrete uniform distribution when restricted 
to values at a particular tail, though not overall. In what follows we consider 
the case of A = E, and use the notation Pc{x) = P^{x). 

3.1 Binomial distribution 

For Binom{n,p) distribution the mode is M = \_{n + l)p\ = \_E -\- p\. When 
(n + l)p is an integer, M = (n + l)p and M — 1 are both modes, and the 
mean E = np E (M — 1, M) is unattainable. When E is an integer, M = E. 
In all cases the distance \M — E\ < 1. The median is one of \np\ or \np\ ±1. 
Consider first the symmetric case p = 0.5. For odd n, the value {n + l)p is an 
integer, both tails of the distribution have weight 0.5 and Pc{x) = Pprobix). 
For even n, the mean E = np is an integer, wl > 0.5, but = 0.5. Un- 
modified version Pc{x) is symmetric at E with values Pc{x) < Pprob{x) for 
X E. The modified version P^\x) = Pprob{x). 

Statistical packages differ in regards to the 2 -sided p-values for the bino- 



mial test: R (IR Development Core Team! . |200J) uses Pprob{x) and StatXact 



(www.cytel.com) uses the doubled p-value. 
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The three p- values, Pc, Pc ^ Pprob are plotted in FigurelUfor p = 0.2 
and two values of n, n = 10 and n = 11. In the first case ii^ = 2 is an attain- 
able value. It can be seen that P^j^'^ > Pc on the left plot. The weight of 
the left tail is wl = 0.678 vs w'l = 0.521. Consequently, P^'{x) = l.SPcix) 
for all X but E. Modified conditional p-value P^^ is considerably closer to 
Pprob at the left tail, and Pprob < Pc < PcT^ at the right (thin) tail. In fact, 
in this example for n = 10, p = 0.2, Pprob provides exact 1-sided p- values 
for X > 4, Pc{x) = 1.60Pprob{x) and P^"'\x) = 2mPprob{x) for X > 4. So 
Pprob{5) = 0.033, Pf(5) = 0.066, Pc{5) = 0.052, and P^{5) = 0.068. The 
two-sided binomial test as programmed in R uses Pprob and would reject the 
null hypothesis of p = 0.2 at 5% level given an observed value of 5, whereas a 
test based on the doubled or conditional p-value would not reject. The same 
thing may happen for much larger values of n. For example, for n = 101, 
p = 0.1 and the observed value of x = 17 the values Pprob 

0.030 , 

Pf = 0.06 and Pc = P^ = 0.052. 

For n = 11 (right plot) E = 2.2 is not attainable. Pc = Pc has two 
modes at 2 and 3. Here wl = 0.617 and Pc{x) = 1.62PF{x) for x < 2, 
whereas Pc = 2.61(1 — F{x — 1)) for x > 3. 

Typically, Pc{x) < Pprob{x) at the thick tail, and Pc{x) > Pprob{x) at 
the thin tail. Even for large n the difference between Pc and Pprob is rather 
large. For example, for n = 101 and p = 0.1 the values are Pc'(17) = 0.052 
and Pprob{^7) = 0.030 in comparison to the 1-sided p-value of 0.023. 

For the binomial distribution the weight of the tails converges to 0.5 rather 
slowly, and Pc{x) — >• Pf{x), see Tabled! Whenever the mean is attainable, 
the weight of the thin right tail is also more than 0.5, and Pc{x) < Pf{x). 
If E is not attainable, the weight of the thin tail is less than 0.5, and then 
Pc > Pf- This is always true for P^{x). The distribution is more symmetric 
when the mean is attainable. Otherwise even for n = 1001, the weight of the 
left tail is wl = 0.522 for p = 0.1. 



3.2 Hyper geometric distribution 

Consider a crosstabulation of two binary variables A and B. We shall refer 
to numbers of observations in the cell and respective probabilities as n^j 
and Pij, i,j = 1,2. The value riu is the statistic of Fisher's exact test used 
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Figure 4: Plot of Pprob (solid line, circles), Pq (long dash, filled circles), 
Pc (doted line, squares) and Pp (dashed line, triangles) for Binom{10, 0.2) 
distribution (left), and i?mom(ll, 0.2)(right). On the right plot Pq = Pc- 




to test for association of A and B given fixed margins rii^, ra+i, n. The value 
rill defines all the other entries in a table with given margins. A parameter 
of primary importance is the odds ratio p = P11P22/P12P21 estimated by p = 
^ii^22/'"'i2^2i- The case of no association pij = Pi+p+j is equivalent to p = 1. 
Denote the expected values rriij = E juij) = nj+n +Jn, with E = mn 



E(nii). The number riu > E iS p > 1 . iFishen (119351 ) derived the distribution 

of nil as 



The null distribution (standard hypergeometric) is for p = 1. For testing 
i^o : P = 1 vs i^i : p > 1 the p-value is p+ = E«>nn /(^i; ^+1; P)- ^or 
Hi : p < 1 the p-value is p_ = Z]u<nii /(""j '^i+j ""-+1) p)- For a two-sided 
test, Pprob seems to be the p-value of choice, implemented both in R and in 
StatExact. 



Sometimes other one-sided test statistics are used to test for association; 
they may be based on the differences of proportions in rows or columns (e.g. 
nii/n+i — ni2/n+2) or on the log p. Nevertheless, all other possible 1-sided 
tests are equivalent to Fisher's exact test since their statistics are strictly 
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p = 0.1 


p = 0.2 


71/ 


wl 


UjL/ ujr 


L 


Wl 




W J 




0.736 


1.130 


0.531 


0.678 


1.086 


U.521 


11 


0.697 


2.304 


0.697 


0.617 


1.614 


0.617 


20 


0.677 


1.113 


0.527 


0.630 


1.070 


0.517 


21 


0.648 


1.844 


0.648 


0.586 


1.416 


0.586 


50 


0.616 


1.083 


0.520 


0.584 


1.049 


0.512 


51 


0.598 


1.485 


0.598 


0.556 


1.250 


0.556 


100 


0.583 


1.063 


0.515 


0.559 


1.036 


0.509 


101 


0.570 


1.325 


0.570 


0.540 


1.172 


0.540 


200 


0.559 


1.046 


0.511 


0.542 


1.026 


0.507 


201 


0.550 


1.221 


0.550 


0.528 


1.119 


0.528 


500 


0.538 


1.030 


0.507 


0.527 


1.017 


0.504 


501 


0.532 


1.135 


0.532 


0.518 


1.074 


0.518 


1000 


0.527 


1.022 


0.505 


0.519 


1.012 


0.503 


1001 


0.522 


1.094 


0.522 


0.513 


1.052 


0.513 



Table 1: Weight of the left tail wl = P{x < A) and the ratio of the weights 
of two tails wl/wr for Binom{n,p) distribution, stands for the modified 
weight wf = P{X < A)/{1 + P{A)). 



increasing functions of Un, as shown by iDavid (119861 ). T he Fisher's exa ct 
test is also the UMPU test if the randomization is allowed (Tocher, 19501 ). 



For Hyper {x; n^i, n) distribution, the full range of values x for fixed 
margins /i+i, n) is {x = m_, ■ ■ ■ , m+}, where m_ = max(0, rii^ + — 
n) and m+ = min(?T,i+, n+i). The mode is M= [(r2i+ + l)(n+i + l)/(n+2)J = 
[^iPi+{l-p+i)+P+iil-Pi+) + l/n)+E\. Therefore [E\ < M < [E+l/2\. 
when M is an integer, M — 1 and M are both modes and the mean E G 
(M — 1, M) is unattainable. When E is an integer, M = E. In all cases the 
distance \M - E\ < 1. 

Exact 2-sided tests for association are used when both positive and nega- 
tive associations are of interest. However, there is ongoing controversy about 



tion 


Yates 


1996 


)• 



David (119861 ) compares the p- values associated with the following 6 statis- 



tics: Ti = -P{nn), T2 = 
T5 = Y.ij{nij - m 



\nn/n+i - ^12/^+2 



+1^+2. 
T, = 



Y /m 



nil - 

mii)2(ni+n2+n+in+2) 



"1^11 - mill 
log(p)l, 
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Tq = 2J2ij^ij^og{nij/mij). Statistic Ti orders the the tables according to 
their probabihty, and corresponds to a test based on Pprob, T2 a nd T-^ are stan- 
dard large-sample tests for homogeneity of proportions, T4 ( iHill and Pikd . 
19651 ) rejects for small and large values of observed log-odds ratio, T5 is 
the Pearson's chi-square test statistic, and Tq is the likelihood ratio statistic 
( lAgresti and Wackerlyl . Il977l ). It can be seen that T2, T3 and T5 are strictly 
increasing functions of \nii — mii\, and therefore the p- values for them do 
not differ. Further, all of the statistics Tj,j = 1, ■ ■ ■ , 6 are strictly decreasing 
functions ofjTjjforjTji ^ "^11 > ^ind strictly increasing functions of riu for 
nil > fnu- iDavid (119861 ) further shows that the 2-sided tests Ti, T4,T5 and 
Tg are not equivalent due to differing ordering of the tables at the opposite 
tail. 

Consider the t able with ni argins (ni+n2+n+in+2) = (9, 21, 5, 25) used 
as an example in iDavid (119861 ). The possible rin values are through 5, 
E(?7.ii) = 1.5, so the left tail has two tables only, for rin = and 1, with 
the total probability of wl = .521. Tables with rin = 2, ■ ■ ■ ,5 are on the 
right tail, t he tota l proba bility is wr = .479. Two tails are rather close in 
probability. iDavid (119861 ) looks at the orderings of tables according to the 
increasing values of test statistics, as follows: 



Ti 



1 2 3 4 5 

2 1 3 4 5 
{1 2} {0 3} 4 
2 1 3 4 5 



Due to monotonicity of all statistics Tj,j = 1, ■ ■ ■ , 6 at both sides of the 
mean mn, the conditional p- values for all 6 statistics do not differ (Lemma 
2). Therefore all 6 2-sided tests are equivalent. This is the main advantage of 
the conditional p- value for hypergeometric distribution. Fisher's exact test is 
usually superseded by the chi-square test for large cell numbers. Equivalence 
of these two tests is of practical importance, for example when testing for 
linkage disequilibrium in genetics. 



The probabilities of the 6 tables along with their one-sided p- values, Pprob 
and Pc values are given in columns 2-5 of Table 2. Conditional p- values are 
very close to doubled 1-sided p-values. The second set of tables in Table 
2 corresponds to margins (ni+?7,2+?T.+in+2) = (9,31,5,35) . Here the left 
tail probability is 0.689, and the thin right tail has probability 0.311. The 
probabilities and the p-values are given in columns 6-9. Here the inflation of 
the conditional p-values on the right tail is more prominent. 
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nil 




Pi— sided 


-Pprob 


Pc 




Pi— sided 


Pprob 


Pc 


U 


.143 


.143 


.286 


.274 


.258 


.258 


.570 


.374 


1 


.378 


.521 


1 


1 


.430 


.689 


1 


1 


2 


.336 


.479 


.622 


1 


.246 


.311 


.311 


1 


3 


.124 


.143 


.143 


.299 


.059 


.065 


.065 


.209 


4 


.019 


.019 


.019 


.040 


.006 


.006 


.006 


.028 


5 


.001 


.001 


.001 


.002 


.0002 


.0002 


.0002 


.0006 



Table 2: 6 possible tables, their probabilities and various p-values for Fisher's 
exact test for a table with margins (ni+n2+n+in+2) = (9,21,5,25) are given in 
columns 2-5. The same information for a table with margins (ni+n2+n_|_in+2) = 
(9,31,5,35) is given in columns 6-9. 



4 Discussion 



Two-sided testing in non-symmetric distributions is not straightforward. The 
UMPU tests are not implemented in the mainstream software packages even 
for continuous problems, and require randomization in the discrete case. The 
non-a symptotic GLR tests are also not implemented, and are, in general, bi- 



ased, iBar-Lev et al.l (l2002l ). At the same time the two-sided tests are the 
staple in all applications. An importance of a conceptually and computa- 
tionally simple approach to two-sided testing is self-evident. 

The conditional 2-sided p-value Pq introduced in Section 1 is closely re- 
lated to doubled p-value and has an intuitive appeal. Its use is advocated for 
both continuous and discrete distributions. An important advantage of this 
p-value is that equivalent 1-sided tests are transformed into Pc-equivalent 
2-sided tests. This helps to resolve the ongoing controversy about which 2- 
sided tests should be used for the association in 2 by 2 tables. 



The properties of this p-value compare favorably to the doubled p-value 
and to the minimum likelihood p-value Pprob , the main two implemented 
options in statistical tests for non-symmetric distributions. For the variance 
test, the bias of the P^-based test is smaller than the bias of the standard 
equal tails test based on the doubled p-value, and much smaller than the bias 
of the Pprob-based test. For the considerably unbalanced sample sizes, the 
Pc-based test is also less biased than the equal tails F-test of the equality of 
variances. 

We did not compare the power and the bias of the resulting tests for the 
binomial and the hypergeometric cases. This is difficult to do for tests at 
different levels without recourse to randomisation. For asymptotically normal 
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tests, both p-values should result in asymptotically UMPU tests, though 
the minimum likelihood p-value may require more stringent conditions to 
ensure the convergence of the density to normal density. The proof of these 
statements is a matter for further research. 

Another open question is which version Pc{x) or P^^"^\x) should be used 
for an attainable value of A. Motivation for P^^™'\x) is less clear, it also 
results in a more conservative test on top of the inescapable conservativeness 
due to discrete distribution. 



Gibbons and PrattI (119751 ) consider a large number of 2-sided p-values and 



find them lacking. They recommend reporting one-tailed p-value with the 
direction of the observed departure from the null hypothesis. In this spirit, 
the conditional p-value conditions on this direction. 
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